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ABSTRACT 



A hierarchical system for object-based audiovisual descrip- 
tive lagging of images for information retrieval, editing, and 
manipulation, includes: an object-based selection mecha- 
nism for selecting an object of interest in said image; 
hierarchical data structure generation means for generating 
a hierarchical data structure for said image and for associ- 
ating auxiliary information with said image; and a 
transmission/storage mechanism for storing the image and 
the hierarchical data structure. A hierarchical method for 
object-based audiovisual descriptive tagging of images for 
information retrieval, editing, and manipulation, includes: 
selecting an object of interest in said image with an object- 
based selection mechanism; generating a hierarchical data 
structure for said image and for associating auxiliary infor- 
mation with said image; and transmitting/storing the image 
and the hierarchical data structuire. 

28 Claims, 3 Drawing Sheets 
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HIERARCHICAL METHOD AND SYSTEM A further object of the invention is to provide a system 

FOR OBJECT-BASED AUDIOVISUAL and method that provides a means for creation of image 

DESCRIPTIVE TAGGING OF IMAGES FOR content-related information, forming the data structure con- 

INFORMATION RETRIEVAL, EDITING, AND taining this information, and means for experiencing this 

MANIPULATION 5 information. Such systems may include a camera, or a 

camera connected to a personal computer, or any informa- 

'Vhis application claims benefit of Provisional Apphca- tion appliance with image acquisition or generation, 

tion 60/061,405 Sep. 29, 1997. viewing, and handling capabilities. In the above, the term 

"experiencing" refers to audio -visually observing image- 

FIELD OF THE INVENTION content related information by display and playback, and 

This invention relates to systems that associate informa- utilizing refers to editing, archiving and retrieving, 

tion with images and utilize such information in content- manipulating, re-purposing and communication of images, 
based information retrieval, object-based editing and 
manipulation applications, and a method of manipulating 

information in such systems. ^5 j ^ ^^^^^ diagram of the major components of the 



BRIEF DESCRIPTION OF THE DRAWINGS 



BACKGROUND OF THE INVENTION 



DETAILED DESCRIPTION OF THE 
PREFERRED EMBODIMENTS 



system of the invention. 

FIG. 2 is a block diagram of a content-based information 

Associating information with images is useful to enable retrieval system, 

successful identification of images and the interchange of pj^. 3 is a block diagram depicting an object-based image 

images among different appbcations. When associated mfor- editing method 

mation is audiovisually rendered in addition to the image ^ , . . ^ . r . , 

data itself, images may be utilized and enjoyed in new ways. ^ ^P''^^ ^'"''^'"^^ P'^^^"*'' 

In known methods and systems, such information is gener- ment. 

ally global in nature, i.e., it applies to the entire image ^1^; ^ depicts integration of the hierarchical data struc- 
without distinguishing between different objects (e.g., per- ture with image data using JFIF file format, 
son versus background, or different persons) in the image. 
An example of a file format that has been developed by 
standardization bodies, that allow global information attach- 
ment to images, is Still Picture Interchange File Format This invention provides a system and method for (i) 
(SPIFF), specified as an extension to the JPEG standard, defining object-based information about regions within a 
ISO/IEC IS 10918-3 (Annex F). digital image, (ii) structuring and integrating such informa- 
In known systems, information is simply "pushed" to the tion to a common file format that contains the image data 
user with no provisions for interactivity. Known systems do itself, and (iii) utilizing such information in content-based 
not address audio-visualization of content information at all; information retrieval, object-based editing and manipulation 
they are geared towards classical image database or image applications. 

file exchange applications. There is no way for the user to The method of the invention is designed to work with any 

learn additional information about the subject of the image image compression standard, such as the current JPEG 

as displayed. standard, as well as future versions of JPEG, such as 
SUMMARY OF THE INVENTION « JPEG2000. Associating information about bounding reel- 

angles of different image objects, as well as precise contour 

The hierarchical system for object-based audiovisual data are among the unique features of this invention. An 
descriptive tagging of images for information retrieval, important feature of the invention is that the hierarchical 
editing, and manipulation, of the invention includes: an data structure and the content-related information is down- 
object -based selection mechanism for selecting an object of loaded and presented to a user only at the user's request. An 
interest in an image; a hierarchical data structure generation object-based paradigm is provided, 'llie system and method 
means for generating a hierarchical data structure for the supports new types of content-related information such as 
image and for associating auxiliary information with the Web pages and object boundary information. A linking 
image; and a transmission/storage mechanism for storing the mechanism which may link an image or a region/object in an 
image and the hierarchical data structure. image to any other local or remote muhimedia content is 

The hierarchical method of the invention for object -based provided. The newly defined format is backwards compat- 

audiovisual descriptive tagging of images for information ible with existing systems. 

retrieval, editing, and manipulation, includes: selecting an The invention uses an object-based paradigm as opposed 

object of interest in an image with an object-based selection to the frame-based, i.e., information refers to the entire 
mechanism; generating a hierarchical data structure for the 55 image without enabling the possibility for distinguishing 

image and for associating auxihary information with the among different image objects, paradigms of known sys- 

image; and transmitting/storing the image and the hierarchi- lems. 

cal data stmcture. major components of an embodiment of a system of 

It is an object of the invention to develop a hierarchical the invention arc depicted in FIG. 1, generally at 10, In this 
data structure and method that enables association of eo embodiment, an image 12 is acquired and/or generated. The 

descriptive data to an image. image may be acquired by a camera, generated by a 

Another object of the invention is to provide a system and computer, or may be an existing image. Once the image is 

method where the descriptive data may be specific to objects acquired, object selection 14 may be performed interactively 

in the image and may include textual information, links to by drawing rectangles that enclose objects of interest, Rect- 
other files, other objects within the same image or other 65 angles may be drawn on an LCD via pen stylus input, in the 

images, or links to web pages, and object features, such as case where image 12 acquisition or generation occurs in a 

shape, and audio annotation. camera or on a computer, respectively. Alternatively, object 
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selection may be performed on a computer platform to additional information provided by this invention never 
which digital images are downloaded. Object-based infor- becomes intrusive. Steps a-e are implemented by audiovi- 
mation input 14 may be performed via pen input for textual sual realization of object information module 24, which 
and link information. Audio annotation may be input via a contains appropriate computer software, 
microphone that may be integrated to the camera to allow 5 jn a complete implementation of the invention, content- 
annotation during the acquisition process. It is also possible ^ased image retrieval and editing are also supported. A 
to feature a speech recognition module in the camera and search engine 28 is provided to allow the user to locate a 
input textual information via speech using speech-to-text -^^ • ^-^^ ^^^^ by an object-based image 
conversion. A compression module 15 includes an audio ^lanipulation and editing subsystem 26. Images 12 may be 
compression mechanism I5a and a data compression „ * • j- j * u i_- i_ * • n *• tj- •* 1 
/; . ^ . r J- * ^0 contained in a database which contains a collection of digital 
mechanism 15o. Compression of audio annotation usine a . • , ■ .t ..i- . 
standard audio compression method (e.g., Delta Pulse images herem. Such an image database may also be referred 
Coded Modulation (DPCM)) and compression of other '° »^ ^ o/a digital hbrary. 

associated data using a standard data compression method Content-based information retneval provides users new 

(e.g., Lempel-Zev-Welch (LZW)) are optional. dimensions to utilize and mteracl with images. First, the user 

Generation of a hierarchical data structure 16 containing " ""^y '="'=^on some regions/objects of interest in an image to 
the information in two levels, where the first layer is called '^'^'^"^ ^f"*" "bout them. Such information 
the "base layer", is described later herein. An integration may include: links to the related Web sites or other multi- 
module 17 combines content related data and the image data "^^la material, textual descriptions, voice annotation, etc. 
itself into a common file in the preferred embodiment. This Second, the user may look for certain images m a database 
combination may be supported as a native part of a future ^° advanced search engines. In database applications, 
image file format, such as, for example, that which may be indexed and retrieved on the basis of 
adopted by JPEG2000 or MPEG4. It is also possible, associated infonnation describing their content. Such 
however, to use currently existing standard file formats by content-based information may be associated with images 
extending them in a proprietary fashion. The latter will objects withm images and subsequently used in infor- 
provide backward compatibility in the sense that a legacy " ""''^n ."'^'"g current invention, 
viewer using an existing file format may at least display the . Object-based image editmg enables a user to mampulate 
image, without breaking down, and ignore the additional ™*g^^ °f '^e objects in the images. For example, 
information. This wiU be described later herein. An imple- '^e user may "drag" a human subject in a picture, "drop" it 
mentation with separate image and information files is also '° " ^'^^f^"' background image, and therefore compose a 
possible, with certain pros and cons, as will be described ™*g« desired effects. The current inven- 
later in connection with FIG. 4. Integrated image-content to prease outlme (contour) mformation of 
and image data itself is then transmitted or stored, block 18, '° ^"^^^^ '^""'8 dragging objects from one 
in a channel, in a server, or over a network. '""^S* '° »"°'her where they may be seamlessly integrated 

Storage may be a memory unit, e.g., memory in an , jo differem backgrounds Together, content-based informa- 

electronic camera, or in a server. Alternatively, the integrated ^""^ object-based image editing offer a user 

data may be sent via Email, or as an attachment to an Email. "'"^ "''"""^ experience in viewing and manipulating 

Image compression module 20 is optional and may be ^°^f^ff" rn- 

provided to implement the JPEG standard, or any other , ° following, an integrated method to enable an miage 

image compression algorithm. If audio and/or the other '^"'f "'^"=^''1^ f".PP°" ^""'ent-based informat^n retneva 

associated data is compressed, decompression of audio *° *ject-based .mage editing ,s disclosed. -Pie method 

A^f^ « • 1 r *• f constructs a hierarchical data structure in which the "base 

and/or data is performed pnor to audiovisual reahzation of , „ , , r • • 

.p • ^ I -^A m • J *i- L- layer carries only content-related information indicators 

the information in module 24. Once images and the hierar- ^ . . . • . .x^ , , , 

chical data structure associated with them are available to *f "^'•'^"'"•y f'f Zt'' actual content-related 

useis, they may be utilized interactively. mformation is carried m the second layer. The hierarchical 

Interactive Audiovisual Realization: implementation ensures that the downloading efficiency of 

An interactive system utilizing the invention may follow <:ompressed images is practically intact after mtroducmg the 

the following steps to implement the retrieval and audiovi- ft'f "o°ahties, while those functionahties may be fully 

, V *• e u- * • p ™ ui 1 1^ • * J reauzed when a user instructs so. 

sual realization of object information, block 24, associated ™ • . • • , , . - 

with the imaee* Inere are two major objectives when developing a 

, 50 method to support content-based information retrieval and 

(a) retrieve and display the image data; i • , i j • j *• -ru i\ ^ 
; ; J L i_ A ■ c ' object-based image editing. They are: 1) a compressed 

(b) read the base layer information; i^j^ge which supports such functionalities should be able to 

(c) using the base layer information as an overlay gen- be downloaded at essentially the same speed and stored 
eration mechanism, generate an overlay to visually using essentially the same disk space as if it does not support 
indicate the regions that contain information in terms of 55 such functionalities; 2) such functionalities may be fully 
"hot spots", according to the region information con- realized when a user/application elects to do so. 

taincd in the base layer. A hot spot may be only To fulfill the above objectives, a hierarchical data struc- 

highlightcd when user's pointing device points at a which has two layers is used. The first layer, referred to 

location within the area of that region; herein as the "base layer," contains up to a fixed number of 

(d) display pop-up menus by the objects as the user points 60 bytes. Those bytes are used for specifying a number of 
and clicks on the hot spots, where the types of available regions of interest and storing a number of flags which 
information for that object are featured in the menus; indicate whether certain additional content-related informa- 
aad tion is available for a region. The second layer carries the 

(e) render the information selected by the user when the actual content-related information. In a networking 
user clicks on the appropriate entry in the menu. 65 application, initially only the compressed image and the 

It is important to note that the hot spots and pop-ups are only base layer of its associated content-related information are 

invoked in response to user's request. In that sense, the transmitted. Since the base layer carries only up to a fixed 
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TABLE 1 



BASE LAYER SYNTAX 



Syntax 


Bits 


Mnemonic 


num_of_regions 


6 


uimsbC 


for (n - 0; n < num_of__regions; n++) { 






region_start_x 


N 


uimsbf 


region_start_y 


N 


uimsbf 


region_width 


N 


uimsbf 


region _heighl 


N 


uimsbf 


link_flag 


1 


bslbf 


xncta_flag 


1 


bslbf 


voicc_flag 


1 


bslbf 


houndary__Rag 


1 


bslbf 


sccurity_flag 


1 


bslbf 


mpcg7_flag 


1 


bslbf 



10 



small number of bytes, its impact on transmitting speed of 
the image may be negUgible in practice. 

Referring now to FIG. 2, after the initial downloading, a 
user may view the image 40, and may also decide to interact 
with the contents of the image. This may include interacting 
with an object of interest, such as character 1 (42), character 
2 (44) or another item, such as item 46. Alternately, a region 
of the image may be considered as an object of interest. The 
entire image also may be treated as an object of interest. The 
user may do so by "clicking" on regions or objects in which 
the user may be interested. The system will then display a 
pop-up menu 48, 50, which lists the available information 
related to the chosen region or object, based on the flags 
stored in the base layer. If the user selects one item in the 
menu, the system will then start downloading the related 
information stored in the second layer from the original 
source and display it to the user. The user may also choose 
to save a compressed image with or without its content - 
related information. When the user chooses to save the 
image with its content-related information, the flags corre- 
sponding to the available information in the base layer will 20 
be set to true, and vice versa. 

An initial set of content-related information, which may 
be of common interest, includes: 1) links; 2) meta textual 
information; 3) voice annotation; and 4) object boundary. 
Additionally, 5) security-copyright information; and 6) ref- 25 
erences to MPEG-7 descriptors, as described in "MP£G-7: 
Comext and Objectives {Version 4)," ISO/IEC JTC1SC29/ 
won, Coding of Moving Pictures and Audio, N1733, July 
1997, may be displayed (not shown). The syntax of Table 1 
may be used to support the acquisition of content-related ^o 
information. It should be noted that other types of content- 
related information may be added to this initial set as 
necessary in order to satisfy various applications. For 
example, a computer code, for instance written in Java® 
language, may be added to the list of associated information. 
In some cases, the system will open an already running 
application, such as a web browser, media player, or, the 
system may be required to launch an application if the 
application is not already running. Such applications may 
take any form, such as a word processing application, a 
Java® Applet, or any other required application. 



35 



region_height the height of a region. 

link__flag a 1-bit flag which indicates the existence of links 
for a region. *r indicates there are links attached to this 
region and *0' indicates none. 

meta_flag a 1-bit flag which indicates the existence of meta 
information for a region. 'V indicates there is meta 
information and *0' indicates none. 

voicejag a 1-bit flag which indicates the existence of 
voice annotation for a region. '1' indicates there is voice 
annotation and '0' indicates none. 

boundary__flag a 1-bit flag which indicates the existence of 
accurate boundary information for a region. '1* indicates 
there is boundary information and indicates none. 

sccurity_flag a 1-bit flag which indicates the existence of 
security-copyright information for a region. *1* indicates 
there is such information and '0' indicates none. 

mpeg7_flag a 1-bit flag which indicates the existence of 
references to MPEG-7 descriptors for a region. 'V indi- 
cates there is MPEG-7 reference information and '0' 
indicates none. 

The above syntax suggests that the base layer is light weight. 
With 256 bytes, for example, the base layer may define at 
least 26 regions anywhere in an image whose size may be as 
large as 65,536x65,536 pixels. To define 4 regions in an 
image, the base layer consumes only 38 bytes. 

SECOND LAYER SYNTAX 

The second layer carries actual content-related informa- 
tion which, for each region, may include links, meta 
information, voice annotation, boundary information, 
security -copyright information, and MPEG-7 reference 
information. The high-level syntax of Table 2 may be used 
to store the above information in the second layer. 

TABLE 2 



where N " ceil(log2 (max(image_width, imagc_height))). 

Semantics 

num_of_regions the number of regions in an image which 
may have additional content-related information. 

region_start_x the x coordinate of the upper-left corner of 
a region. 

region_start__x the y coordinate of the upper-left corner of 
a region. 

region_width the width of a region. 



so 



55 



60 



Syntax 



40 



45 



for (n = 0; n < nuni_o£_rcgions; n++) { 

links 0 

meta Q 

voice Q 

boundary 0 

securityO 

mpcg? 0 

end_of_region 

} 



bslbf 



The links and meta information are textual data and 
requires lossless coding. The voice information may be 
coded using one of the existing sound compression format, 
such as delta pulse coded modulation (DPCM). The bound- 
ary information may utiUze the shape coding techniques 
developed in MPEG-4 "Description of Core Experiments on 
Shape Coding in MPEG-4 Video/' ISO/IEC JTC1/SC29/ 
WGll, Coding of Moving Pictures and Audio, N1584, 
March 1997. The security-copyright information may utilize 
certain encryption techniques. The earlier cited MPEG-7 
reference information contains certain types of links to the 
future description streams developed in MPEG-7. 

The exact syntax and format for each type of the above- 
identified content-related information may be determined 
during the course of file format development for future 
65 standards, and are presented herein merely as exemplars of 
the system and method of the invention. In general, however, 
the syntax structure of Table 3 may be used. 
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TABLE 3 



Syntax 


Bits 


Mnemonic 


typc__of_info 


8 


bslbf 


lcngth_of_data 


16 


uimsbf 


data 0 







Semantics 

linksQ the sub-syntax for coding links. 
metaO the sub-syntax for coding meta information. 
voiceQ the sub-syntax for coding voice annotation. 
boundaryO the sub -syntax for coding boundary information. 
securityO the sub-syntax for coding security-copyright 
information. 

mpegTQ the sub-syntax for coding MPEG-7 reference infor- 
mation. 

end_of_region a 16-bit tag to signal the end of content- 
related information for a region. 
type_of_info a 8-bit tag to uniquely define the type of 
content-related information. The value of this parameter 
may be one of a set of numbers defined in a table which 
lists all types of content-related information such as links, 
meta information, voice annotation, boundary 
information, security-copyright information, and 
MPEG-7 reference information. 
length_of_data the number of bytes used for storing the 

content -related information, 
data 0 the actual syntax to code the content-related infor- 
mation. This may be determined on the basis of applica- 
tion requirements, or in accordance to the specifications 
of a future file format that may support the hierarchical 
data structure as one of its native features. 
A few examples which demonstrate some typical use of the 
functionalities are now presented. 
Content-Based Information Retrieval 

Attaching additional information, such as voice annota- 
tion and URL links to regions/objects in an image allows a 
user to interact with the image in a more interesting way. It 
adds a new dimension to the way we view and utilize still 
images. FIG. 2 depicts a scenario where an image with such 
functionalities, i.e., an information enhanced image, is dis- 
played. The application reads the image data as well as the 
base layer information. It then displays the image and 
visually indicates the "hot spots" via an overlay on the 
image, according to the region information in the base layer. 
A user clicks on a region/object which the user may be 
interested in. A pop-up a menu appears which lists items that 
are available for the selected region/object. When the user 
selects the voice annotation item, for example, the applica- 
tion will then locate the sound information in the second 
layer and play it back using a default sound player applica- 
tion. If the user selects a link which is a URL link to a Web 
site 52, the system will then locate the address and display 
the corresponding Web page in a default Web browser. A link 
may also point to another image file or even point to another 
region/object in an image. Similarly, additional meta infor- 
mation may also be retrieved and viewed (in a variety of 
different forms) by the user by simply selecting the corre- 
sponding item from the menu, such as a media player 54. 

Using the method described above, different regions/ 
objects in the same image may have different additional 
information attached. A user is able to hear different voices 
corresponding to different characters in the image, for 
instance. Individual Web pages may also be attached directly 
to more relevant components in the scene, respectively. 
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Object-Based Image Editing 

When images are edited, it is desirable to cut/copy/paste 
in terms of objects having arbitrary shapes. The proposed 
method supports such functionality provided additional 
5 shape information is coded. FIG. 3 depicts an example 
whereby using the boundary information 60 associated with 
a baby object 62, a user may copy baby object 62, and place 
it into a different background 64, thus, moving one 
computer-generated image into another computer-generated 
image. The sequence of actions may happen in the following 
order. The user first clicks on baby object 62 and the system 
pops up a menu 66. The user then selects the boundary item 
68, which is generated by a boundary generation mechanism 
in the system. The system then loads the boundary infor- 
mation and highlights the baby object, as is indicated by the 

^5 bright line about the object. The user may then copy and 
paste 70 the baby object by either performing drag and drop 
type 72 of action, or by selecting the copy and paste 
fiinctions from the edit menu 70, 
Content-Based Retrieval of Images 

20 By associating MPEG-7 descriptors to images, the images 
may be retrieved based on their graphical contents by 
advanced search engines. The descriptors may include color, 
texture, shape, as well as keywords, as to be determined in 
MPEG-7. In general, an image only needs to carry lightj- 

25 weight reference information which points to the MPEG-7 
description stream. ! 

An integrated method to support the advanced function- 
alities of content-based information retrieval and object- 
based image editing has been disclosed. The method I 

30 employs a two-layer hierarchical data structure to store the / 
content-related information. The first layer carries coordi- j 
nates which specify regions of interest in rectangular shape * 
and flags which indicate whether certain additional content- 
related information is available for the specified regions. The 

35 actual content-related information is stored in the second 
layer where one may find links, meta information, voice 
annotation, boundary information, security-copyright 
information, and MPEG-7 reference information for each 
specified region. 

40 llie first layer is designed to be light weight, i.e., at most 
256 bytes. This ensures that the downloading and storage I 
efiiciency of a compressed image may be essentially intact 1 
unless a user explicitly requires additional content-related 1 
information. [On the other hand, should the user require such 

45 information, our proposed method also guarantees it may be 
fully delivered. 

The existing JPEG compressed image file formats, such 
as still picture interchange file format (SPIFF) or JPEG File 
Interchange Format (JFIF), do not inherently support object- 
so based information embedding and interactive retrieval of 
such information. Although, creation and experiencing and 
utilization of information enhanced images may be per- 
formed using the method and system of the current 
invention, it may be desirable that the information enhanced 

55 images created by the current invention may be at least 
decoded and displayed by legacy viewers using JFIF or 
SPIFF Indeed the legacy systems will not be able to 
recognize and utilize the associated information as the 
invention system would. The goal is therefore to guarantee 

60 successful image decoding and display by a legacy system 
without breaking down the legacy system. 

If backward compatibility with legacy viewers, such as 
those that utilize JFIF and SPIFF file formats, is a necessity, 
the disclosed hierarchical data structure may be encapsu- 

65 la led into a JIFF or SPIFF file format. Examples of such 
encapsulations thai may be implemented by module 17 in 
FIG. 1 are given below. 
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In case of JIFF file format (Graphics File Formats: Second 
Edition, by J. D. Murray and W. VanRypcr, O'Reilly & 
Associates Inc., 1996, pp. 510-515.) Referring now to FIG. 
5, a JFIF file structure is shown generally at 90. The JFIF file 
format contains JPEG data 92 and an End Of Image (EOI) ^ 
marker 94. A JFIF viewer simply ignores any data that 
follows the EOI marker. Hence, if the 2-layer hierarchical 
data structure 96 disclosed herein is appended to a JFIF file 
immediately after EOI 94, the legacy viewers will be able to 
decode and display the image, ignoring the additional data 
structure. A system constructed according to the current 
invention may appropriately interpret the additional data and 
implement the interactive functionalities of the invention. 

Using SPIFF, the hierarchical data structure may be 15 
encapsulated using a private tag, known to the system of the 
current invention. Since a legacy viewer will ignore non- 
standard tags and associated information fields, according to 
the SPIFF specification, images may be successfully 
decoded and displayed by SPIFF-compliant legacy systems. 20 
The system of the invention will then recognize and appro- 
priately utilize the added data to enable its interactive 
functionalities. (Another more accessible reference for 
SPIFF is: Graphics File Formats: Second Edition, by J. D. 
Murray and W. VanRyper, O'Reilly & Associates Inc., 1996, ^5 
pp.822-837.) REF 

Tlie method may be applied to any existing computing 
environment. If an image file is stored in a local disk, the 
proposed functionalities may be realized by a stand-alone 3Q 
image viewer or any application which supports such 
functionalities, without any additional system changes. If the 
image file is stored remotely on a server, the proposed 
functionalities may still be realized by any application which 
support such functionalities on the client side, plus an image 35 
parser module on the server. ITie reason the server needs to 
include an image parser is because the additional content- 
related information resides in the same file as the image 
itself. When a user requests certain content-related informa- 
tion regarding a selected region/object in an image, e.g., its 
meta information, it is important that the system will fetch 
only that piece of information and present it to the user as 
fast as possible. To achieve this objective, the server has to 
be able to parse an image file, locate and transmit any piece 
of content-related information specified by the client. 

To implement the above without any enhancement on a 
currently existing network server, each content-related infor- 
mation has to be stored in a separate file, as shown in FIG. 
4, generally at 80. Therefore, for each defined region, as 50 
many as six files which contain links, meta information, 
voice annotation, boundary information, security-copyright 
information, and MPEG-7 reference information, respec- 
tively. For a given image, say my image.jpg, a directory 

called my_image.info which contains content-related infor- 55 
mation for N defined regions is created and stored in: 



regionOl. links 
regionOl.mela 
rcgionOl .voice 
regionOl .boundary 
regionOl .security 
regionOl. mpeg? 



regionON.links 
regionON.meto 
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-continued 

rcgionON.voicc 
rcgionON.boundary 
regionON.security 
regie nON. mpeg? 



Of course, the solution of using separate files to store 
addition information is fragile and messy in practice. A 
simple mis-match between the file names due to name 
change would cause the complete loss of the content-related 
information. 

"Images" in this invention may correspond to frames of 
digital video sequences, for example to a set of frames that 
are most representative of the video content. It should also 
be noted that the image-content information may be com- 
pressed to provide storage efficiency and download speed. 
This may be performed by state of the art compression 
methods. Shape information may be compressed, for 
instance, using the method included in the MPEG4 standard. 
In this case, the viewing application should be equipped with 
the appropriate decompression tools. 

The invention has the following advantages over the 
known prior art: (1) it is object-based and thus flexible; (2) 
it allows for inclusion of object feature information, such as 
object shape boundary; (3) is has a hierarchical data struc- 
ture and hence it does not burden in any way those appli- 
cations that choose not to download and store image-content 
related information; (4) it allows audiovisual realization of 
object-based information, at users* request; (5) it allows for 
inclusion of URL links and hence provides an added dimen- 
sionality to enjoyment and utilization digital images (The 
URL links may point to web pages related to the image 
content, such as personal web pages, product web pages, and 
web pages for certain cities, locations etc.); and (6) it is 
generic and applicable to any image compression technique 
as well as to uncompressed images. With the same token, it 
may provide object-based functionalities to any forthcoming 
compression standards, such as JPEG 2000. Although, none 
of the current file formats inherently support the method and 
the system disclosed herein, methods of implementing the 
system in a backward compatible manner where legacy 
systems may at least decode the image data and ignore the 
added information have been disclosed. 

Data structures configured in the manner described in the 
invention may be downloaded over a network in a selective 
fashion not to burden applications that are only interested in 
the image data but not the content information. The down- 
loading application checks with the user interactively 
whether the user desires to download and store the content 
information. If the user says "No", the application retrieves? 
only the image data and the base layer and sets the flags in 
the base layer to zero indicating that there are no content 
information with the image. — 

The method and system also support scalable image 
compression/decompression algorithms. In quality-scalable 
compression, image may be decoded at various different 
quality levels. In spatial scalable compression, the image 
may be decoded at different spatial resolutions. In case of 
compression algorithms that support scalability, only the 
region information and object contour needs to be scaled to 
support spatial scalability. All other types of data stay intact. 

Although a preferred embodiment of the system and 
method of the invention have been disclosed, it will be 
appreciated by those of skill in the art that further variations 
and modifications may be made thereto without departing 
from the scope of the invention as defined in the appended 
claims. 
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We claim: 

1. A hierarchical system for associating object-based 
information with images for information retrieval, editing, 
and manipulation, comprising: 

an object-based selection mechanism for selecting an 
object of interest in an image; 

hierarchical data structure generation means for generat- 
ing a hierarchical data structure for said image and for 
associating object-based information with said image; 

a transmission/storage mechanism for storing the image 
and the hierarchical data structure; 

a retrieval and manipulation mechanism for allowing a 
user selectively to retrieve and manipulate the image 
and the object-based information associated therewith; 
and 

a generation mechanism for generating an overlay asso- 
ciated with said image, wherein said overlay includes at 
least one hot spot which is visually distinguishable 
from the remainder of the image when highlighted by 
the user. 

2. The system of claim 1 which includes an image 
acquisition mechanism for acquiring an image. 

3. The system of claim 1 which includes a display 
mechanism for displaying the image to a user. 

4. The system of claim 3 wherein said display mechanism 
is constructed and arranged to display said hierarchical data 
structure to a user. 

5. The system of claim 1 which includes a storage 
mechanism for storing an image. 

6. The system of claim 1 which includes a database 
containing a collection of digital images therein. 

7. The system of claim 1 wherein said image and said 
hierarchical data structure for said image are stored in a 
single file. 

^ 8. The system of claim 1 wherein said image and said 
hierarchical data structure for said image are stored in 
separate files. 

9. The system of claim 1 wherein said generation mecha- 
nism generates boundary information for identifying a 
boundary about an object of interest, and wherein said 
boundary groups all of the information within said boundary 
for manipulation by the user 

10. The system of claim 1 which includes an audiovisual 
realization mechanism wherein object-based information is 
visually displayed to the user, and audibly played to the user, 
upon the user's request. 

11. The system of claim 1 which includes an audiovisual 
realization mechanism wherein object-based information is 
used for object -based image editing. 

12. The system of claim 1 wherein said hierarchical data 
structure includes a base layer which includes only content- 
related information indicators, and a second layer which 
includes content-related information. 

13. A hierarchical system for associating object-based 
information with images for information retrieval, editing, 
and manipulation, comprising: 

an image acquisition mechanism for acquiring an image; 

an object-based selection mechanism for selecting an 
object of interest in said image; 

hierarchical data structure generation means for generat- 
ing a hierarchical data structure for said image and for 
associating object-based information with said image, 
thereby forming an information-enhanced image; 

a transmission/storage mechanism for storing the 
information -enhanced image; 

a display mechanism for displaying the information- 
enhanced image to a user; 
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a retrieval and manipulation mechanism for allowing a 
user selectively to retrieve and manipulate the image 
and the object-based information associated therewith; 
and 

5 a generation mechanism for generating boundary infor- 
mation for identifying a boundary about an object of 
interest, and wherein said boundary groups all of the 
information within said boundary for manipulation by 
the user. 

14. The system of claim 13 wherein said display mecha- 
nism is constructed and arranged to display said hierarchical 
data structure of the information-enhanced image to a user. 

15. ITie system of claim 13 which includes a database 
containing a collection of digital images therein. 

16. The system of claim 13 wherein said image and said 
hierarchical data structure for said image are stored in a 
single file. 

17. The system of claim 13 wherein said image and said 
hierarchical data structure for said image are stored in 
separate files. 

18. The system of claim 13 wherein said generation 
mechanism generates an overlay associated with said image, 
and wherein said overlay includes at least one hot spot which 
is visually distinguishable from the remainder of the image 

2j when highhghted by the user. 

19. The system of claim 13 which includes an audiovisual 
realization mechanism wherein object-based information is 
visually displayed to the user, and audibly played to the user, 
upon the user's request. 

20. The system of claim 13 which includes an audiovisual 
realization mechanism wherein object-based information is 
used for object-based image editing. 

21. The system of claim 13 wherein said hierarchical data 
structure includes a base layer which includes only content- 

2 J related information indicators, and a second layer which 
includes content-related information. 

22. A hierarchical method for associating object-based 
information with images for information retrieval, editing, 
and manipulation, comprising: 

4Q selecting an object of interest in an image with an object- 
based selection mechanism; 
generating a hierarchical data structure for said image and 
for associating object-based information with the 
image, including generating a base layer which 

45 includes only content-related information indicators, 
and generating a second layer which includes content- 
related information; 
selectively retrieving and manipulating the image and the 
information associated therewith: including: 

50 (a) retrieving the image data; 

(b) reading the base layer information; 

(c) displaying the image; 

(d) generating an overlay to visually indicate the 
regions that contain object-based information in 

55 terms of "hot spots", according to the region infor- 

mation contained in the base layer; 

(e) displaying pop-up menus as the user points and 
clicks on the hot spots, wherein the types of available 
object-based information are featured in the menus; 

60 and 

(f) retrieving and rendering the object-based infonma- 
tion selected by the user when the user clicks on the 
appropriate entry in the menu; and 

transmitting/storing the image and the hierarchical data 
65 structure. 

23. 'llie method of claim 22 which includes acquiring an 
image with an image acquisition mechanism. 
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24. The method of claim 22 which includes displaying the 
transmitted/stored image to a user. 

25. The method of claim 22 which further includes 
displaying visually object-based information and playing, 
audibly object-based information to the user, upon the user's 
request. 

26. The method of claim 22 which includes using object- 
based information for object-based image editing. 
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27. The method of claim 22 wherein said generating an 
overlay includes highlighting a hot spot when user's point- 
ing device points at a location within the area of that region. 

28. The method of claim 22 wherein said generating an 
overlay includes identifying a boundary about an object of 
interest. 
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